Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

Authors

Abstract:

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to estimate soil loss in the catchment upstream. Hence, one of the valid methods to estimate soil erosion is using of the recorded data of hydrometery stations in combination with catchment characteristics that will provide accurate predictions. For this purpose, recognition of similar sub-watersheds according to climatic, physiographic, geologic land use could be useful in the erosion control operations. 2- THEORETICAL FRAMEWORK To estimate the exact amount of sediment in the ungauged areas, clustering is introduced as a key step. Various methods and techniques have been used to determine the best number of clusters. However, application of different clustering methods and selection of the best one is rarely found. To this aim, the objective of present study is to determine the most important variables in sediment production using Single linkage, Ward and β-Flexible methods for the clustering of sub-watersheds of Gorganroud and Qareh-Sou river basins in Golestan Province. 3- METHODOLOGY The Gorganroud and Qareh Sou Watersheds are located at the North-Eastern part of Iran. The seventeen hydrometric stations were selected with a 24-year (1986–2010) recorded data of discharge and suspended sediment load. The Grubbs and Beck method was used to perform the verity in order to verify the outlier discharge measured data. The correlation method was used to fill the missing data in time series. The normality of discharge and suspended sediment data were tested using Kolmogrov-Smirnov test and verified for choosing the well-set trend analyses method. The linear regression and Mann-Kendal Taw methods were used for the data with normal and non-normal distribution in trend analysis, respectively. Auto Correlation Function (ACF) test method was used to determine the internal consistency between the data series. A set of 38 factors from the five main groups of categories were investigated to determine the sediment yield controlling independent variables. Principal Component Analysis (PCA) was used to determine the most effective variables. In order to detect the best classification method, three classification techniques (Single linkage, Ward’s, and β-flexible methods) were examined in the study area. The Single Linkage also called nearest neighbor is a simple clustering method. The object pairs forms clusters hierarchically starting from the most similar pairs according to the similarity in a descending order. Ward’s algorithm is one of the frequently used techniques for the regionalization studies of hydrology and climatology factors. A generalized hierarchical method, β-Flexible, formed the group calculating the external object. The distance from a point to the group was computed in this method. Many indices have been developed to examine the validity of clustering techniques based on finding an optimal partitioning. In the present study, Pseudo F and Dunn’s Indices were used to assess the accuracy of clustering algorithms. Accurate clustering means having non-overlapping partitions. One of the most commonly used criteria for the selection of group number is the maximization of pseudo-F statistics. This statistics is based on multivariate normal distribution of data. 4- RESULTS All data series of 17 sub-watersheds in Gorganroud and Qareh Sou basins were tested with different clustering alghorithms. Two data series showed autocorrelation, detected by the ACF test. Two data sets had trends according to the Kendal’s test. Therefore, 13 sub-watersheds remained for the final classification. Some 38 independent variables were calculated and screened with PCA. The variables with similar effects on sediment yield, were grouped in 7 components. The selected components were chosen according to the amount of variance. The results of PCA and the selected representative variables in each component have been given in Table 1. Table 1: Result of Principal Component Analysis of effective variables on sediment yield in Gorganroud and Qareh Sou Watersheds, Iran Components Spatial Amount Variance (%) Cumulative Variance (%) Representative variable 1 7.99 21.60 21.60 Main stream length 2 6.82 18.43 40.03 Flow discharge with 10 years of return period 3 5.97 16.12 56.16 Percent of forest area 4 5.25 14.18 70.33 Percent of agricultural lands area 5 4.98 13.47 83.81 Drainage density 6 2.56 6.92 90.73 Percentage of permeable formations area 7 1.95 5.28 96.01 Concentration time Results of Ward’s, Single linkage and β-flexible methods as hierarchical techniques have been summarized in Table 2. Table 2 Results of the hierarchical clustering technique in Gorganroud and Qareh Sou Watersheds, Iran. Method Clusters Number Dunn Coefficient Psedue-F Single Linkage 2 0.29 2.12 3 0.45 3.50 4 0.32 2.89 5 0.43 3.30 Ward 2 0.29 4.06 3 0.19 2.73 4 - - 5 - - β-Flexible 2 0.29 3.57 3 - - 4 0.37 4.06 5 - - 5- CONCLUSIONS & SUGGESTIONS The results showed that the Single linkage method presented a better performance considering the accuracy criterion. The suspended sediment values were determined using measured discharge and available Sediment Rating Curves; therefore, the identified clusters as the reliable and appropriate watershed grouping methods which could be regarded as a useful tool in the management of watersheds particularly in the context of erosion and sedimentation.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Turbidity Threshold Sampling for Suspended Sediment Load Estimation

Abstract: The paper discusses an automated procedure for measuring turbidity and sampling suspended sediment. The basic equipment consists of a programmable data logger, an in situ turbidimeter, a pumping sampler, and a stage-measuring device. The data logger program employs turbidity to govern sample collection during each transport event. Mounting configurations and housings for the turbidime...

full text

Turbidity-controlled sampling for suspended sediment load estimation

Automated data collection is essential to effectively measure suspended sediment loads in storm events, particularly in small basins. Continuous turbidity measurements can be used, along with discharge, in an automated system that makes real-time sampling decisions to facilitate sediment load estimation. The Turbidity Threshold Sampling method distributes sample collection over the range of ris...

full text

Turbidity-controlled suspended sediment sampling for runoff-event load estimation

For estimating suspended sediment concentration (SSC) in rivers, turbidity is generally a much better predictor than water discharge. Although it is now possible to collect continuous turbidity data even at remote sites, sediment sampling and load estimation are still conventionally based on discharge. With frequent calibration the relation of turbidity to SSC could be used to estimate suspende...

full text

Determination of the Best Model to Estimate Suspended Sediment Load in Zaremrood River, Mazandaran Province

Extended abstract 1- Introduction The phenomena of erosion, sediment transport, and sedimentations have tremendously destructive effects on the environment and hydraulics structures. In general, the sediment transportation depends on river discharges, but the proposed equations inherited serious errors.  The estimation of suspended sediment load (SSL) is one of the most important factors in r...

full text

Traffic state estimation using hierarchical clustering and principal components analysis: a practical approach

Traffic state estimation and prediction are fundamental requirements for automatic control of urban road traffic with both adaptive traffic lights and variable message signs. For that, collecting of actual traffic data is necessary. This paper deals with the combined application of principal components analysis (PCA) and hierarchical cluster analysis (HCA) for the specification of the needed nu...

full text

Determination of the Best Hierarchical Clustering Method for Regional Analysis of Base Flow Index in Kerman Province Catchments

The lack of complete coverage of hydrological data forces hydrologists to use the homogenization methods in regional analysis. In this research, in order to choose the best Hierarchical clustering method for regional analysis, base flow and related index were extracted from daily stream flow data using two parameter recursive digital filters in 43 hydrometric stations of the Kerman province. Ph...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 6  issue 4

pages  47- 67

publication date 2017-02

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023